Unify SolrIndexSearcher paths with CollectorManager integration by markrmiller · Pull Request #4487 · apache/solr

markrmiller · 2026-06-01T02:22:06Z

Replaces the dual sequential/parallel branches in getDocListNC and getDocListAndSetNC with a single private searchAndCollect helper that internally selects between single-slice and multi-slice execution. The top-level methods no longer fork on MultiThreadedSearcher.allowMT. Also makes canceling a query when doing parallel segment search actually do something.

Performance — this commit vs parent

1.5M docs / 150 segments, 300 samples per cell per build, combined from two independent round-robin sessions (parent and this commit runs alternated within each session against a single shared solr-home). 30 warmup + 30 timed runs per cell per round. sudo powermetrics --samplers cpu_power,thermal ran in parallel at 1s cadence for both sessions; thermal verification below.

cell                                                               parent ms±se     this commit   ms±se    delta%        z
-----------------------------------------------------------------------------------------------------------------------------
match-all-top500           baseline    mt_false                  17.98±0.042      17.72±0.025      -1.4%     -5.4
match-all-top500           baseline    mt_true                   17.85±0.039      17.73±0.026      -0.7%     -2.7
match-all-top500           ste         mt_false                  10.19±0.027      10.09±0.022      -1.0%     -2.8
match-all-top500           ste         mt_true                   10.13±0.021      10.10±0.023      -0.3%     -0.8
match-all-top500           postfilter  mt_false                  25.54±0.045      25.78±0.083      +0.9%     +2.5
match-all-top500           postfilter  mt_true                   25.41±0.045      25.62±0.033      +0.8%     +3.8
match-all-top500           both        mt_false                  19.94±0.044      20.23±0.029      +1.5%     +5.6
match-all-top500           both        mt_true                   20.12±0.080      20.23±0.035      +0.5%     +1.2

big-bool-or-top1000        baseline    mt_false                  24.76±0.053      24.88±0.038      +0.5%     +1.8
big-bool-or-top1000        baseline    mt_true                   26.31±0.046      26.31±0.038      -0.0%     -0.1
big-bool-or-top1000        ste         mt_false                   5.66±0.036       5.58±0.020      -1.5%     -2.1
big-bool-or-top1000        ste         mt_true                    5.61±0.027       5.49±0.020      -2.2%     -3.6
big-bool-or-top1000        postfilter  mt_false                  30.49±0.040      30.55±0.040      +0.2%     +1.0
big-bool-or-top1000        postfilter  mt_true                   30.47±0.041      30.58±0.036      +0.4%     +2.0
big-bool-or-top1000        both        mt_false                   6.96±0.023       6.89±0.021      -1.0%     -2.3
big-bool-or-top1000        both        mt_true                    6.89±0.037       6.98±0.022      +1.3%     +2.1

sort-by-rank-top500        baseline    mt_false                  17.73±0.025      17.83±0.026      +0.6%     +2.9
sort-by-rank-top500        baseline    mt_true                   17.76±0.025      17.78±0.027      +0.1%     +0.6
sort-by-rank-top500        ste         mt_false                  17.40±0.021      17.37±0.024      -0.2%     -0.9
sort-by-rank-top500        ste         mt_true                   17.39±0.022      17.37±0.024      -0.2%     -0.9
sort-by-rank-top500        postfilter  mt_false                  25.86±0.065      25.67±0.034      -0.8%     -2.7
sort-by-rank-top500        postfilter  mt_true                   25.74±0.030      25.79±0.080      +0.2%     +0.6
sort-by-rank-top500        both        mt_false                  23.98±0.030      23.90±0.028      -0.3%     -2.0
sort-by-rank-top500        both        mt_true                   23.93±0.027      23.96±0.030      +0.1%     +0.7

big-bool-or-score-top500   baseline    mt_false                  54.93±0.070      55.05±0.110      +0.2%     +0.9
big-bool-or-score-top500   baseline    mt_true                   53.37±0.063      53.54±0.088      +0.3%     +1.5
big-bool-or-score-top500   ste         mt_false                  19.49±0.072      19.60±0.112      +0.5%     +0.8
big-bool-or-score-top500   ste         mt_true                   19.52±0.161      19.50±0.073      -0.1%     -0.1
big-bool-or-score-top500   postfilter  mt_false                  59.68±0.167      59.63±0.065      -0.1%     -0.3
big-bool-or-score-top500   postfilter  mt_true                   59.42±0.056      59.69±0.102      +0.4%     +2.3
big-bool-or-score-top500   both        mt_false                  20.41±0.174      20.66±0.252      +1.2%     +0.8
big-bool-or-score-top500   both        mt_true                   20.15±0.039      20.20±0.044      +0.2%     +0.8

Negative delta = this commit faster than parent.

sudo powermetrics --samplers cpu_power,thermal -i 1000 ran in parallel for both sessions (~13 min each).

build	sessions	runs	P-cluster avg MHz	CPU_W avg	thermal pressure
parent	2	10	P0 3041, P1 3060	8.27	Nominal (every sample)
patched	2	10	P0 3051, P1 3055	8.25	Nominal (every sample)

No Fair/Serious/Critical events for any sample in any run across either session.

…ased entry point Replaces the dual sequential/parallel branches in getDocListNC and getDocListAndSetNC with a single private searchAndCollect helper that internally selects between single-slice and multi-slice execution. The top-level methods no longer fork on MultiThreadedSearcher.allowMT. Key changes: - Extracts TopDocsCM, MaxScoreCM, DocSetCM, FixedBitSetCollector, SearchResult, TopDocsResult, and MaxScoreResult into a new SolrCollectorManagers utility. Adds PostFilterCM<T> (single-slice; setLastDelegate per newCollector, complete() at reduce) and DocSetCollectorCM (preserves the SortedIntDocSet sparse optimization). Multi-slice paths keep the FixedBitSet-based DocSetCM. - Deletes MultiThreadedSearcher; allowMT is inlined as the canParallelize check inside searchAndCollect. - Migrates the collector chain (segment-terminate-early, max-hits early termination, optional post-filter, optional cancellation) to a CollectorManager-based ChainCM applied per newCollector(). Single-slice vs multi-slice is chosen by ChainCM.requiresSingleSlice() OR the caller's parallelEligible hint. - Replaces the outer CancellableCollector with a shared AtomicBoolean polled by every per-slice CancellableCollector, so a single tracker.cancel(queryID) is observed across all parallel slices. - ChainCM.getScoreMode() returns the chain-wrapped collector's score mode so a Solr post-filter override (e.g. CollapsingPostFilter forcing COMPLETE) reaches populateScoresIfNeeded. Guaranteed non-null at the call site: Lucene's IndexSearcher.search calls newCollector() unconditionally before its empty-index short-circuit, so the call site uses Objects.requireNonNull rather than an unreachable fallback. - TopDocsCM.reduce skips TopDocs.merge for a single input, preserving the underlying TopDocsCollector's EQUAL_TO / GREATER_THAN_OR_EQUAL_TO relation exactly. - Aligns SolrMultiCollectorManager with Lucene's MultiCollector: per-child CollectionTerminatedException handling, per-child setMinCompetitiveScore aggregation (MinCompetitiveScoreAwareScorable), and removal of the EarlyTerminatingCollector wrap (ChainCM owns it, with a shared LongAdder across slices to preserve cross-slice maxHitsAllowed semantics). Clamps EarlyTerminatingCollector chunkSize to >= 1 to avoid divide-by-zero when maxDocsToCollect < 10. A few hot-path alignments with Lucene's MultiCollector pattern that together avoid a regression on heavy multi-term boolean OR queries: - searchAndCollect skips SolrMultiCollectorManager entirely when only one inner CollectorManager is built (top-N without fl=score and without docSet — the common case), routing through a new SolrCollectorManagers.wrapSingle adapter. Mirrors Lucene's own MultiCollectorManager.wrap() short-circuit for the one-collector case. - SolrMultiCollectorManager$Collectors$LeafCollectors.collect now matches MultiCollector.MultiLeafCollector#collect exactly: the per-iteration body is load-slot / null-check / call, and "all children terminated" is detected lazily inside the rare CollectionTerminatedException catch via allLeavesTerminated(). - QueryCancelledException and EarlyTerminatingCollectorException override fillInStackTrace() to return this, matching CollectionTerminatedException — these are control-flow exceptions and should not pay for stack-trace fill. No behavior change: numFound, ranked doc_ids, maxScore, numFoundExact and start match the parent commit on every probed (query, mode, multiThreaded) cell, including post-filter, STE, and post-filter + STE combinations. Verified with the full org.apache.solr.search.* suite plus targeted runs for TestMultiThreadedSearcher, TestEarlyTerminatingQueries, TestQueryLimits, CursorPagingTest, TestRankQueryPlugin, TestCollapseQParserPlugin, TestRandomCollapseQParserPlugin, and TestReRankQParserPlugin.

dsmiley

Very impressive... this is really an overhaul of multithreaded collection/search, not just a unification.

Glad to see benchmarks! But I'm surprised to see that solr/benchmark (JMH) wasn't used... why not? I'm also having trouble interpreting what you posted.

100 segments is a lot; not very realistic

dsmiley · 2026-06-02T04:03:51Z

+    }
+  }
+
+  static class FixedBitSetCollector extends SimpleCollector {


could use a bit more docs/comments to explain what's going on

dsmiley · 2026-06-02T05:34:06Z

surprisingly a net increase in LOC ... but more documentation so that's good

HoustonPutman · 2026-06-03T20:15:32Z

So it seems this copies a lot of code from Lucene. What needs to change in Lucene's MultiCollectorManager and MultiCollector such that we would be able to extend their classes without sacrificing our functionality?

github-actions Bot added tests cat:search labels Jun 1, 2026

markrmiller requested a review from HoustonPutman June 1, 2026 02:22

dsmiley reviewed Jun 2, 2026

View reviewed changes

HoustonPutman reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify SolrIndexSearcher paths with CollectorManager integration#4487

Unify SolrIndexSearcher paths with CollectorManager integration#4487
markrmiller wants to merge 1 commit into
apache:mainfrom
markrmiller:unify-searcher-collector-manager

markrmiller commented Jun 1, 2026

Uh oh!

dsmiley left a comment

Uh oh!

dsmiley Jun 2, 2026

Uh oh!

dsmiley Jun 2, 2026

Uh oh!

HoustonPutman Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

markrmiller commented Jun 1, 2026

Performance — this commit vs parent

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

dsmiley Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

dsmiley Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

HoustonPutman Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants